Skip to content

Localization: stop the AI translator stripping quotes that are part of a translated value#25721

Merged
jkmassel merged 3 commits into
jkmassel/claude-string-translationfrom
ai-translator-clean-quote-test
Jun 30, 2026
Merged

Localization: stop the AI translator stripping quotes that are part of a translated value#25721
jkmassel merged 3 commits into
jkmassel/claude-string-translationfrom
ai-translator-clean-quote-test

Conversation

@oguzkocer

@oguzkocer oguzkocer commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Targeting #25705.

The AI translation tier could drop quotation marks that are part of a translated value. translate_plural and translate_all ran clean() — which removes the cosmetic quotes a model wraps around a raw reply — on values already decoded by JSON.parse, so a value whose own content is quoted (e.g. "Reader") lost its quotes too.

Fix

Run clean() only on the raw single-string reply in translate(). The JSON-decoded plural/batch values are whitespace-trimmed but never quote-stripped — JSON.parse has already removed the structural quotes, so anything left is content. This also covers the async collect_batch path (it shares validated_batch) and curly “…” quotes.

Test plan

  • Four regression guards fail before the fix and pass after — straight "…" and curly “…” quotes, across both the sync (translate_plural/translate_all) and async (collect_batch) paths.
  • Full ai_translator_test.rb suite green (34 runs, 0 failures); test_returns_cleaned_translation still pins the single-string cosmetic-quote stripping, so the fix has to be path-specific.
  • RuboCop clean.

A translation whose value is itself wrapped in quotation marks must keep them;
only the model's cosmetic wrapping around a raw single-string reply should be
stripped. Cover both structured paths (translate_plural, translate_all).
@oguzkocer oguzkocer added this to the 27.1 milestone Jun 30, 2026
@oguzkocer oguzkocer requested a review from a team as a code owner June 30, 2026 19:32
@oguzkocer oguzkocer added Testing Unit and UI Tests and Tooling Tooling Build, Release, and Validation Tools labels Jun 30, 2026
@oguzkocer oguzkocer requested a review from jkmassel June 30, 2026 19:43
clean() removes the cosmetic quotes a model wraps around a raw single-string reply. The plural and batch paths ran it on values already decoded by JSON.parse, so a value whose own content is quoted (e.g. "Reader") lost its quotes too.

Run clean() only on the raw single-string reply in translate(); the JSON-decoded plural/batch values are whitespace-trimmed but never quote-stripped, since JSON.parse has already removed the structural quotes and anything left is content.

Also covers the async collect_batch path (shares validated_batch) and curly-quoted values for free. Satisfies the tests added in d923c25.
@jkmassel jkmassel changed the title Localization: add failing tests for the AI translator stripping quotes from translated values Localization: stop the AI translator stripping quotes that are part of a translated value Jun 30, 2026
…ervation tests

Two more regression guards for the clean()-on-decoded-value fix: a curly/smart-quoted value (“Reader”) through translate_all, since clean() strips “ ” as well as straight quotes; and a quoted value through the async collect_batch path, which shares validated_batch with translate_all.

Both fail against the pre-fix code, so a narrower fix — only un-stripping straight quotes, or only the sync path — cannot slip past.
@jkmassel jkmassel merged commit 65bc4df into jkmassel/claude-string-translation Jun 30, 2026
26 checks passed
@jkmassel jkmassel deleted the ai-translator-clean-quote-test branch June 30, 2026 20:21
pull Bot pushed a commit to kliu/WordPress-iOS that referenced this pull request Jul 1, 2026
* Localization: AI translation primitives

Reusable, unit-tested Ruby primitives for the AI translation tier of the
localization pipeline — the service behind the `human ?? AI ?? English` floor
whose AI stub was left open in wordpress-mobile#25688. Pure prompt-building and validation with
the Anthropic SDK call injected, so the logic is testable without the gem or the
network. Not wired into any lane yet.

- TranslationValidator: format-specifier safety gate — a translation must
  preserve the source's placeholders (count and type; positional reordering
  allowed), or it is rejected and falls back to English.
- Glossary: brand do-not-translate list plus per-locale terms and register.
- AITranslator: single-string, per-key plural form-set (one consistent stem
  across CLDR forms), and batched string translation, with structured-output
  (output_config) enforcement.
- AnthropicBatch: Message Batches submit/await/results/collect for bulk backfill.

50 unit tests, rubocop clean.

* Localization: run the AI translation tooling unit tests in CI

The pure-Ruby unit suites (TranslationValidator, Glossary, AnthropicBatch,
AITranslator) weren't executed by any pipeline step — the "Unit Tests" jobs are
the Xcode/XCTest suites, and rubocop (via Danger) only lints them. Add a
lightweight Buildkite step that runs each fastlane/lanes/*_test.rb with plain
ruby (stdlib minitest — no Xcode, no app build, no bundle).

Runs unconditionally rather than behind should-skip-job.sh --job-type validation,
which skips on tooling-only changes — i.e. exactly the PRs that touch these files.

* Localization: correct the for_plural docstring

The previous note advertised for_plural as a one-line swap to wire the live
translation tier. That path routes each plural form through single-string
translate, so it forfeits the cross-form consistency translate_plural exists to
provide — the lemma drift PLURAL_OUTPUT warns about. Relabel for_plural as the
per-cell fallback and point the live-tier wiring at translate_plural's form-set
seam.

* Localization: stop the AI translator stripping quotes that are part of a translated value (wordpress-mobile#25721)

* Localization: assert clean() preserves quotes that are part of a value

A translation whose value is itself wrapped in quotation marks must keep them;
only the model's cosmetic wrapping around a raw single-string reply should be
stripped. Cover both structured paths (translate_plural, translate_all).

* Localization: stop clean() stripping quotes that are part of a value

clean() removes the cosmetic quotes a model wraps around a raw single-string reply. The plural and batch paths ran it on values already decoded by JSON.parse, so a value whose own content is quoted (e.g. "Reader") lost its quotes too.

Run clean() only on the raw single-string reply in translate(); the JSON-decoded plural/batch values are whitespace-trimmed but never quote-stripped, since JSON.parse has already removed the structural quotes and anything left is content.

Also covers the async collect_batch path (shares validated_batch) and curly-quoted values for free. Satisfies the tests added in d923c25.

* Localization: cover curly quotes and the batch path in the quote-preservation tests

Two more regression guards for the clean()-on-decoded-value fix: a curly/smart-quoted value (“Reader”) through translate_all, since clean() strips “ ” as well as straight quotes; and a quoted value through the async collect_batch path, which shares validated_batch with translate_all.

Both fail against the pre-fix code, so a narrower fix — only un-stripping straight quotes, or only the sync path — cannot slip past.

---------

Co-authored-by: Jeremy Massel <1123407+jkmassel@users.noreply.github.com>

* Localization: rename validated_batch/validated_forms to select_valid_*

At call sites the validated_ prefix reads as an adjective — "the batch that's
already been validated" — when both methods are in fact where batch and
plural-set translations run the placeholder gate, returning only the passing
subset. select_valid_batch / select_valid_forms make the filtering action plain
where they're called. Pure rename of two private helpers; no behavior change.

---------

Co-authored-by: Oguz Kocer <oguzkocer@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Testing Unit and UI Tests and Tooling Tooling Build, Release, and Validation Tools

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants